Document Retrieval and Clustering: from Principal Component Analysis to Self-aggregation Networks
نویسنده
چکیده
We first extend Hopfield networks to clustering bipartite graphs (words-to-document association) and show that the solution is the principal component analysis. We then generalize this via the min-max clustering principle into a self-aggregation networks which are composed of scaled PCA components via Hebb rule. Clustering amounts to an updating process where connections between different clusters are automatically suppressed while connections within same clusters are enhanced. This framework combines dimension reduction with clustering via neural networks and PCA. Self-aggregation networks can also improve information retrieval performance. Applications are presented.
منابع مشابه
SOM-based Document Image Retrieval
In this paper we discuss some applications of word image clustering (based on Self Organizing Maps, SOM) for tasks related to document image retrieval. Two main applications are discussed: document retrieval and word retrieval. In document retrieval a document representation based on the vector model is obtained by computing the occurrences of words belonging to the SOM clusters in each documen...
متن کاملOutlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملEfficient Word Retrieval by Means of SOM Clustering and PCA
We propose an approach for efficient word retrieval from printed documents belonging to Digital Libraries. The approach combines word image clustering (based on Self Organizing Maps, SOM) with Principal Component Analysis. The combination of these methods allows us to efficiently retrieve the matching words from large documents collections without the need for a direct comparison of the query w...
متن کاملData Clustering: Principal Components, Hopfield and Self-Aggregation Networks
We present a coherent framework for data clustering. Starting with a Hopfield network, we show the solutions for several well-motivated clustering objective functions are principal components. For MinMaxCut objectives motivated for ensuring cluster balance, the solutions are the nonlinearly scaled principal components. Using scaled PC A, we generalize to multi-way clustering, constructing a sel...
متن کامل